Search CORE

Helmholtz Zentrum für Infektionsforschung Repository

MPG.PuRe

CAMISIM: Simulating metagenomes and microbial communities

Author: Belmann P
Bremges A
Dahms E
Darling AE
Demaere MZ
Dröge J
Fiedler J
Fritz A
Hofmann P
Lesker TR
Majda S
McHardy AC
Sczyrba A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2018
Field of study

© 2019 The Author(s). Background: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Results: We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. Conclusions: CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation

OPUS - University of Technology Sydney

Publications at Bielefeld University

The Evolution of X-ray Clusters of Galaxies

Author: Allen SW
Bauer FE
Blanchard A
Cavaliere A
Colin Norman
Della Ceca R
Edge AC
Ettori S
Evrard AE
Holden B
Kolb KT
Mazure A
McHardy IM
Monaco P
Olsen LF
Oukbir J
Peacock JA
Peebles PJE
Pentericci L
Perlman ES
Piero Rosati
Pipino A
Rothschild R
Sadat R
Sarazin C
Seljak U
Stefano Borgani
Valageas P
Zwicky F
Publication venue: 'Annual Reviews'
Publication date: 01/01/2002
Field of study

Considerable progress has been made over the last decade in the study of the evolutionary trends of the population of galaxy clusters in the Universe. In this review we focus on observations in the X-ray band. X-ray surveys with the ROSAT satellite, supplemented by follow-up studies with ASCA and Beppo-SAX, have allowed an assessment of the evolution of the space density of clusters out to z~1, and the evolution of the physical properties of the intra-cluster medium out to z~0.5. With the advent of Chandra and Newton-XMM, and their unprecedented sensitivity and angular resolution, these studies have been extended beyond redshift unity and have revealed the complexity of the thermodynamical structure of clusters. The properties of the intra-cluster gas are significantly affected by non-gravitational processes including star formation and Active Galactic Nucleus (AGN) activity. Convincing evidence has emerged for modest evolution of both the bulk of the X-ray cluster population and their thermodynamical properties since redshift unity. Such an observational scenario is consistent with hierarchical models of structure formation in a flat low density universe with Omega_m=0.3 and sigma_8=0.7-0.8 for the normalization of the power spectrum. Basic methodologies for construction of X-ray-selected cluster samples are reviewed and implications of cluster evolution for cosmological models are discussed.Comment: 40 pages, 15 figures. Full resolution figures can be downloaded from http://www.eso.org/~prosati/ARAA

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

Archivio istituzionale della ricerca - Università di Ferrara

CERN Document Server

Translational web robots for pathogen genome analysis

Author: A Kahvejian
AC McHardy
C Hyland
D Parks
G Mariscal
J Shon
JW Huss
M Haeussler
OG Pybus
PS Dehal
SM Leach
T Davidsen
T Oinn
V Sintchenko
V Sintchenko
VM Markowitz
Y Kano
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

4 page(s

Macquarie University ResearchOnline

Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG

Author: AC McHardy
Andreas Wilke
BE Dutilh
C Lozupone
C von Mering
D Benson
Daniel C Richter
Daniel H Huson
DH Huson
F Meyer
Folker Meyer
H Teeling
JA Gilbert
Jack A Gilbert
L Krause
M Kanehisa
Paul Rupek
R Overbeek
S Mitra
S Mitra
SF Altschul
Suparna Mitra
Tim Urich
VM Markowitz
VM Markowitz
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Metagenomics is the study of microbial organisms using sequencing applied directly to environmental samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of metagenome projects. While metagenomics provides information on the gene content, metatranscriptomics aims at understanding gene expression patterns in microbial communities. The initial computational analysis of a metagenome or metatranscriptome addresses three questions: (1) Who is out there? (2) What are they doing? and (3) How do different datasets compare? There is a need for new computational tools to answer these questions. In 2007, the program MEGAN (MEtaGenome ANalyzer) was released, as a standalone interactive tool for analyzing the taxonomic content of a single metagenome dataset. The program has subsequently been extended to support comparative analyses of multiple datasets. Results: The focus of this paper is to report on new features of MEGAN that allow the functional analysis of multiple metagenomes (and metatranscriptomes) based on the SEED hierarchy and KEGG pathways. We have compared our results with the MG-RAST service for different datasets. Conclusions: The MEGAN program now allows the interactive analysis and comparison of the taxonomical and functional content of multiple datasets. As a stand-alone tool, MEGAN provides an alternative to web portals for scientists that have concerns about uploading their unpublished data to a website

University of Bergen

NORA - Norwegian Open Research Archives

White Rose Research Online

ScholarBank@NUS

Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences

Author: AC Martiny
AC McHardy
B Snel
BD Muegge
BJ Haas
CJ Meehan
CL Hemme
CS Smillie
D McDonald
DB Rasher
DH Parks
DH Parks
E Paradis
EK Costello
F Meyer
G Suen
I Cho
J Kuczynski
J Xu
JE Smith
JG Caporaso
JK Harris
JR Zaneveld
KL Barott
KT Konstantinidis
M Csuros
M Kanehisa
M Zuniga
N Fierer
N Knowlton
N Segata
N Segata
P Gajer
PV Patel
R Knight
RC Edgar
RE Collins
RL Tatusov
S Abubucker
S Chaffron
S Federhen
SW Kembel
T Daniluk
TZ DeSantis
V Kunin
VM Markowitz
XC Morgan
Y Boucher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2013
Field of study

Profiling phylogenetic marker genes, such as the 16S rRNA gene, is a key tool for studies of microbial communities but does not provide direct evidence of a community’s functional capabilities. Here we describe PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States), a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty. Our results demonstrate that phylogeny and function are sufficiently linked that this ‘predictive metagenomic’ approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available

Harvard University - DASH

DigitalCommons@Florida International University

eScholarship - University of California

Analysis and comparison of very large metagenomes with fast clustering and functional annotation

Author: AC McHardy
AR Quinlan
B Rodriguez-Brito
D Sheskin
DB Rusch
DC Richter
DH Huson
E Portugaly
EA Dinsdale
EF DeLong
FE Angly
GW Tyson
H Noguchi
H Noguchi
H Teeling
H Teeling
J Shendure
JC Venter
K Mavromatis
KJ Hoff
L Krause
PD Schloss
R Seshadri
RK Aziz
S Yooseph
S Yooseph
SF Altschul
SG Tringe
SR Eddy
SR Gill
W Li
W Li
W Li
W Li
Weizhong Li
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p

Clustering metagenomic sequences with interpolated Markov models

Author: A Brady
A Kislyuk
A McHardy
AC McHardy
AL Delcher
AL Delcher
B Rodriguez-Brito
CKK Chan
D Huson
D Rusch
D Wu
DA Benson
David R Kelley
EA Grice
EK Costello
G Celeux
G Dick
GW Tyson
H Teeling
J Bohlin
J Bohlin
J Bohlin
J Morgan
J Mrazek
J Qin
J Shi
J White
JA Eisen
JG Lawrence
K Chen
K Liolios
K Mavromatis
L Hubert
LB Koski
M Hamady
M Wu
MM Haque
N Diaz
P Smyth
P Tan
R Durbin
R Sandberg
S Chatterji
S Karlin
S Kosakovsky Pond
S Mann
S Navlakha
SF Altschul
SJ Lee
SL Salzberg
Steven L Salzberg
T Abe
T Abe
W Gerlach
YW Wu
Z Weinberg
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results We present S<smcaps>CIMM</smcaps> (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. S<smcaps>CIMM</smcaps> achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of S<smcaps>CIMM</smcaps> and supervised learning method Phymm called P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> that performs better when evolutionarily close training genomes are available. Conclusions S<smcaps>CIMM</smcaps> and P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> are highly accurate methods to cluster metagenomic sequences. S<smcaps>CIMM</smcaps> operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. S<smcaps>CIMM</smcaps> and P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> are available open source from <url>http://www.cbcb.umd.edu/software/scimm</url>.</p

Digital Repository at the University of Maryland

Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem

We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability

Huskie Commons

Public Library of Science (PLOS)

WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

Author: A Campbell
A Schlüter
AC McHardy
Alexander Goesmann
C Simon
CR Woese
CR Woese
D Pushkarev
DH Huson
DR Bentley
EA Dinsdale
EK Wommack
F Meyer
Felix Tille
GW Tyson
H Teeling
J Felsenstein
JC Dohm
JC Venter
Jens Stoye
L Krause
L Krause
M Ashburner
M Breitbart
N Saitou
NN Diaz
PD Schloss
R Durbin
R Rosenkranz
S Karlin
SA Sandin
Sebastian Jünemann
SF Altschul
SG Tringe
SJ Giovannoni
SR Eddy
T Abe
W Gish
Wolfgang Gerlach
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Gerlach W, Jünemann S, Tille F, Goesmann A, Stoye J. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads. BMC Bioinformatics. 2009;10(1):430.Background Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. Results In this paper, we introduce WebCARMA, a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities. In addition, we have analysed the applicability of ultra-short reads in metagenomics. Conclusions We show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is freely available at http://webcarma.cebitec.uni-bielefeld.d